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(57) ABSTRACT 

The present invention provides a per- flow dynamic buffer 
management scheme for a data communications device. 
With per-flow dynamic buffer limiting, the header informa- 
tion for each packet is mapped into an entry in a flow table, 
with a separate flow table provided for each output queue. 
Each flow table entry maintains a buffer count for the 
packets currently in the queue for each flow. On each packet 
enqueuing action, a dynamic buffer limit is computed for the 
flow and compared against the buffer count already used by 
the flow to make a mark, drop, or enqueue decision. Apacket 
in a flow is dropped or marked if the buffer count is above 
the limit. Otherwise, the packet is enqueued and the buffer 
count incremented by the amount used by the newly- 
enqueued packet. The scheme operates independently of 
packet data rate and flow behavior, providing means for 
rapidly discriminating well-behaved flows from non-well- 
behaved flows in order to manage buffer allocation accord- 
ingly. Additionally, the present invention adapts to changing 
flow requirements by fairly sharing buffer resources among 
both well-behaved and non-well-behaved flows. 

56 Claims, 9 Drawing Sheets 
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PER-FLOW DYNAMIC BUFFER sometimes due to protocol incompatibilities, or (more 

MANAGEMENT likely) because they actually are trying to capture more 

router bandwidth. The latter situation arises particularly in 
flows sent by sources that consider themselves higher pri- 

BACKGROUND OF THE INVENTION s ority than all others (hence the term "aggressive"); such 

1. Field of the Invention priority assumptions by one flow are often in error in the 
The present invention relates to internetworking systems modem > hi g hl y heterogeneous networks seen today. 

and in particular to methods and apparatus for managing Several regulation schemes are known in the art. Broadly 

traffic flow in routers and switches. classified, these schemes fall into two types: queue-based 

2. Description of the Related Art 10 and buffer-based. 

Internetworking encompasses all facets of communica- In queue-based schemes, incoming flows are classified 

tions between and among computer networks. Such com- according to their actual priority, as determined by the 

munications data flow streams may include voice, video, receiving router, and assigned accordingly to output queues 

still images, and data traffic. All have widely varying needs w *hin the router. High priority flows, such as time-sensitive 

in terms of propagation delay (or latency) during transit V0lce traffic > are P laced m a <l ueue that 13 read out more 

through the network. Various systems and devices, both in often - priority flows, such as file transfer protocol 

hardware and in software, have attempted to deal with the (FTP) or hypertext transfer protocol (HTTP) flows, are 

plethora of data flow requirements present in modem inter- P laced * q ueues thal are read out of the router at a slower 

networking systems rate * Numerous schemes, discussed below, are used to 

One such scheme consists of attempting to regulate the *° contro1 the f !^™S * nd enqucukg methods to achieve a 

traffic within the router or switch connecting multiple net- measurc of throughput balance or fairness among flows, thus 

works in the typical internetworking system at either the data raar « 'outer/switch bandwidth as efficiently as possible, 

link or network function levels. (The functions performed at ^ ^ be s f n > how , ever > f ° f these * ch f mes ha , ve dra ^" 

each level are defined in the open systems interconnection 25 ba , cks ^ cost, capacity, and efficiency that suggest a better 

(OSI) reference model. This model is well known in the art. scheme 15 needed - 

See, e.g., Merilee Ford, etal., Internetworking Technologies In the extreme, queue-based flow management assigns 

Handbook, Cisco Press 1997.) Such schemes attempt to one queue per input flow. Queues are read out of the router 

provide fair allocation of data throughput capacity according to statistically fair scheduling process, such as 

(bandwidth) by allocating router buffer and/or queue space 30 round-robin employing port scheduler 50. In round-robin 

according to the type of packets in each flow stream scheduling, one packet is read out of each queue, one queue 

received. a * a time, reading again from the first queue only when one 

A particular problem in internetworking traffic regulation P acket has been read 01lt from every other queue. This 

arises from the variety of traffic sources or flows presented svstem * **™ n » fair <F«*"« ( F °)» or weighted fair 

to the router/switching device. Referring to FIG. 1, illus- 35 1 ueuin S <^ FQ ^ While FQ and its variants °P erate wel1 

trating a high-level schematic view of the operation of a whcn ™ mber ™d variety of input flows is small and 

prior art router/switch 10, a number of input flows 20 are well-behaved, they becomes inefficient when the number of 

presented to the unit. These flows each consist of multiple flows g rows - Clearlv > a hi S h numbcr of flows * lar S e 

packets of data, in a variety of sizes and presented at a number of <l ueues > consuming a proportionally larger 

variety of rates. Additionally, flows may be presented in 40 amoa 1 Ilt of resources . both in hardware and in operational 

different protocols, such as the Transmission Control complexity. More memory and more software processing 

Protocol/Internet Protocol (TCP/IP) and the related User overhead is required to set up and tear down the queues as 

Datagram Protocol (UDP), File Transfer Protocol (FTP), flows begin and end. In the context of the modem, high 

Terminal Emulation Protocol (Telnet), and Hypertext Trans- volurne networks seen today, this extra cost and complexity 

fer Protocol (HTTP). Other internetworking protocols are 45 15 UDci esireably inefficient. 

found in the literature, such as Merilee Ford, et. al, Inter- A less extreme queue-based technique is random early 
networking Technologies Handbook, Cisco Press 1997, drop (RED) and variants thereon. In a RED scheme, a 
incorporated herein by reference in its entirety. The packets smaller number of queues (less than the total number of 
are buffered in a buffer pool 30, which is typically random input flows present at any time) is maintained. Flows are 
access memory (RAM). Buffering is accomplished accord- 50 segregated into queues by flow volume, with a number of 
ing to the directives of a controller 60 and a buffer manager high volume flows placed in one queue. Each queue is 
25. The flows are sent to the proper output port 70 by way managed according to a probabilistic flow rule that causes 
of a set of output queues 40 and a port scheduler 50, packets to be dropped more often in the queues associated 
discussed below. Controller 60, buffer manager 25, and port with the heaviest flows. Because of this relationship, heavy 
scheduler 50 are conventionally implemented as one or more 55 flows experience packet drops more often, statistically, than 
high speed microprocessors with associated interface cir- other flows. This scheme achieves a measure of fairness, but 
cuitry. Buffer manager 25 and port scheduler 50 are also it assumes that heavy flows will be well-behaved, i.e., that 
implemented as ASICs. they will reduce flow rate when they experience packet 
Some flows are well-behaved in the event of traffic drops. This assumption has proven to be erroneous in the 
congestion: when faced with packet drops (i.e., packets 60 modern heterogeneous network. Certain NAFs do not 
discarded deliberately by a downstream device due to con- reduce flow rate and thus continue to take an unfair amount 
gestion at that device), these "good" (robust) flows reduce of router bandwidth simply because they counter packet 
their flow rates and send less packets per unit of time. Other dro P s witn retransmissions. The "good" flows get less and 
flows, however, are not well-behaved. These non-adaptive less throughput as they reduce flow rate in response to drops 
"aggressive" flows (NAFs) do not throttle back the flow of 65 while the NAFs capture more bandwidth, 
packets to the router when they experience drops. This may As a further drawback, the random packet drops some- 
be because the NAFs do not recognize the congestion, times hit a fragile flow. These flows contain time -critical 
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traffic of the highest priority, such as voice data. Fragile 
flows have the lowest tolerance for drops and delay, so a 
random packet drop management scheme can have a highly 
detrimental effect on them. 

An alternative to managing router/switch traffic at the 
queue end is to manage flows at the buffer end, referring to 
buffer pool 30 of FIG. 1. The basic premise of buffer-based 
management is that if one limits how much of a particular 
input flow gets into buffers 30 relative to other input flows 
20, the output queues 40 will take care of themselves. Such 
limits on the number of packets buffered per flow can be 
either static or dynamic. 

In the static or strict limit scheme, a set maximum number 
of buffers is available for each flaw. Any packets received 
after those buffers are full are discarded. Static limits are set 
by the system administrator for each type of flow. However, 
this scheme has the obvious drawback of high overhead 
associated with setting up a gating mechanism for each flow 
and administrative oversight. Additionally, it lacks long- 
term flexibility to adapt to the wide variety and constantly 
changing mix of flow types seen in modern internetworking. 

Typical prior art systems implement static buffer limita- 
tion schemes in software with limited hardware support. All 
experience the same or similar drawbacks noted above due 
to overhead (set up and tear down, as well as processing time 
delay and hardware resource) costs. Furthermore, typical 
prior art systems implement buffer limitation schemes based 
on a limit that is imposed per output queue or per class of 
service required by the received packet. FIG. 2 illustrates the 
standard bit configuration for an Internet Protocol (IP) 
packet, including the fields within its header. Class of 
service information, sometimes referred to as flow type or 
flow classification, can be found in, for instance, the prece- 
dence or type of service (TOS) field 210 in the IP received 
packet header 200 or in the source address 220 or a com- 
bination thereof. These systems also either set their limit 
values from manually configured parameters or else update 
them at a relatively slow periodic rate compared to packet 
rates. 

Current schemes are unable to update their limit values 
fast enough to keep up with changing traffic conditions in the 
latest generation of ultra-fast (e.g., Gigabit speed) flows. As 
an additional drawback, the use of TOS field 210 is not 
standardized among internetworking users. Thus, neither 
TOS nor source address is a reliable means of identifying 
flow type at this time. 

What is needed is a scheme to rapidly identify good flows 
from bad (i.e., the well-behaved flows vs. the non-adapting 
aggressive flows) on a packet-by-packet basis. Furthermore, 
a flexible, low-overhead, extremely fast dynamic buffer 
limiting method and apparatus to fairly buffer and enqueue 
the wide variety of good flows and NAFs found in today's 
networks is also needed. 

SUMMARY OF THE INVENTION 

The present invention provides a per-flow dynamic buffer 
management scheme for a data communications device. 
With per-flow dynamic buffer limiting, the header informa- 
tion for each packet is mapped into an entry in a flow table, 
with a separate flow table provided for each output queue. 
Each flow table entry maintains a count of buffers currently 
in the queue for each flow. On each packet enqueuing action, 
a dynamic buffer limit is computed for the flow and com- 
pared against the number of buffers already used by the flow 
to make a mark, drop, or enqueue decision. A packet in a 
flow is dropped or marked if the buffer count is above this 
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limit. Otherwise, the packet is enqueued and the count 
incremented by the number of cells in the newly-enqueued 
packet. 

The scheme operates independently of packet data rate 
5 and flow behavior, providing packet-specific means for 
rapidly discriminating well-behaved flows from aggressive, 
non- adapting (badly behaved) flows in order to manage 
buffer allocation accordingly. Additionally, the present 
invention adapts to changing flow requirements by fairly 
10 sharing buffer resources. The present invention handles 
robust, well-behaved flows that adapt to congestion situa- 
tions signaled by packet drop, fairly sharing bandwidth 
among these flows. The present invention also ensures good 
service for fragile flows (those sending few packets and 
15 those of a time critical nature) such as Voice-over-Internet 
Protocol (VoIP), thereby protecting them from non-adapting 
aggressive flows (NAFs). 

BRIEF DESCRIPTION OF THE DRAWINGS 

20 

The present invention may be better understood and its 
numerous objects, features, and advantages made apparent 
to those skilled in the art by referencing the accompanying 
drawings. 

25 FIG. 1 is a high-level schematic representation of data 
flow and control in a prior art communications device. 

FIG. 2 is a bitmap of a prior art Internet Protocol (IP) 
packet showing the fields within its header. 

FIG. 3 is a flow chart of one embodiment of the enqueuing 
30 aspect of the present invention. 

FIG. 4 is a flow chart of the process whereby data is read 
out of the queue and transmitted out into the network, 
according to one embodiment of the present invention. 
FIG. 5 is a flow chart of the process whereby DBL table 
35 310 of FIG. 3 is created, according to one embodiment of the 
present invention. 

FIG. 6 is a flow chart of the process whereby the step of 
tag packet 340 of FIG. 3 is accomplished, according to one 
4Q embodiment of the present invention. 

FIG. 7 is a flow chart of the process whereby the step of 
enqueue packet 330 of FIG. 3 is accomplished, according to 
one embodiment of the present invention. 

FIG. 8 is an alternate embodiment of the step of get DBL 
45 value 390 of FIG. 3. 

FIG. 9 is an alternate embodiment of the step of com- 
parison 395 of FIG. 3, 

FIG. 10 is a further alternate embodiment of the step of 
comparison 395 of FIG. 3. 
50 The use of the same reference symbols in different draw- 
ings indicates similar or identical items. 

DETAILED DESCRIPTION 

Overview 

55 The dynamic buffer limiting scheme of the present inven- 
tion is based on two interrelated approaches. First, a map- 
ping of packet header information is used to approximate the 
per-flow buffer state by storing a count of currently 
enqueued buffers for each flow into a flow table entry, rather 

60 than a separate queue per flow. Second, the dynamic buffer 
limit (DBL) is determined by lookup in a pre-existing (but 
frequently recalculated) table or by live computation, 
indexed by parameters representing the dynamic state of the 
internetworking device. The DBL is re-determined on each 

65 packet reception. The packet header mapping avoids the 
per-flow lookup, set up, and tear down overhead of prior art 
queue-based management schemes. The present invention 
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also solves the prior art problem of inflexibility in revising 
queue limits according to the rapidly changing buffer usage 
and queue length conditions in the modern network. Work- 
ing together, as further discussed below, these two 
approaches allow the system to rapidly identify well- 
behaved, robust ("good") flows from NAFs and to provide 
fair queuing and efficient router/switch resource utilization 
for all flows. 

Although the terms router and/or switch will be used 
generally in this specification, those skilled in the art will 
realize that other related internetworking devices may be 
used, in addition to routers or switches, to perform analo- 
gous functions. Accordingly, the invention is not limited to 
any particular type of internetworking device, router, or 
switch. Also, although the primary focus of the current 
invention is Internet Protocol (IP) packet flows, those skilled 
in the will art realize that protocols and flows other than IP, 
such as Ethernet, can be benefit from the present invention 
and its alternate embodiments. Accordingly, the invention is 
not limited to any particular type of protocol or packet 
format. 

FIG. 3 illustrates the high-level process involved in 
queue-based management through dynamic buffer limiting, 
specifically focused on the computations and transforma- 
tions of the enqueuing operation. Upon receipt of a packet 
in a given flow, 300, the packet header is parsed 302 to 
determine the packet size, source address, destination 
address, and type of service (TOS). Additionally, the UDP 
source and destination port (for an IP packet) or the MAC 
source and destination and protocol type (for Ethernet 
packets) may be extracted as required to fully identify the 
necessary TOS. Refer to FIG. 2 for the bitwise locations of 
this information within the industry-standard IP packet 
header 200. The number of buffer elements or cells, which 
may be counted in terms of bytes or groups of bytes, 
required to buffer the incoming packet is computed (not 
shown). 

All steps in the process of the present invention are 
implemented in a conventional router or switch system well 
known in the art, such as that depicted in FIG. 1. Other 
examples of such systems may be found in U.S. Pat. No. 
5,088,032, Method and Apparatus for Routing Communi- 
cations Among Computer Networks, to Leonard Bosack; 
U.S. Pat. No. 5,224,099, Circuitry and Method for Fair 
Queuing and Servicing Cell Traffic Using Hopcounts and 
Traffic Classes, to Corbalis, et al.; U.S. Pat. No. 5,359,592, 
Bandwidth and Congestion Control for Queue Channels in 
a Cell Switching Communication Controller, to Corbalis, et 
al; U.S. Pat. No. 5,473,607, Packet Filtering for Data 
Networks, to Hausman et al; and U.S. Pat. No. 5,561,663, 
Method and Apparatus for Performing Communication Rate 
Control Using Geometric Weighted Groups, to Daniel 
Klausmeier, incorporated in their entirety herein by refer- 
ence. 

One of ordinary skill in the art will recognize that the 
above described parsing step may be accomplished by either 
hardware or software means or a combination thereof, such 
as a lookup table. Accordingly, the present invention is not 
limited to any particular parsing means. 
Hash Mapping of Flows to Flow Entries 

In a substantially parallel process, the extracted header 
data is transformed by calculating a hash index 332 accord- 
ing to the following function, expressed in the C program- 
ming language: 
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hdr_ip* iph - (bdr_ip")pkt — 


// get pointer to IP header 


acccss(off_ip_); 


portion of packet 


int i - (int)iph -» src( ); 


// get source IP address 


int j - (int)iph — dst( ); 


// get destination IP address 


int k = i + j; 


// add sre and dst 


return (k + (k » 8) + -(k » 4)) % 


// shift, add, divide modulo a 


((2 « 19) - 1); 


large prime 



10 

Alternatively, the following function can also be used to 
calculate hash index 



15 



hdr_ip* iph - (hdr_ip*)pkt 


// get pointer to IP header 


access(off_ip_); 


portion of packet 


int i - (Lnt)iph -* src( ); 


// get source IP address 


int j - (int)iph -* dst( ); 


// get destination IP address 


i - i " j; 


// XOR sre and dst 


i e,cir - i >> 16; 


// shift high order to low order 


i * - i » 8; 


//shift again 


return i; 





The output of this function is an index to flow table 335 
for the designated output queue for the given input flow. One 

25 of ordinary skill in the art will recognize the process of 
computing a table lookup index based on a limited range of 
inputs as a generic hash function (or hashing), novel here in 
the choice of both input parameters and the precise hash 
function performed on those inputs. Such hashing may be 

30 accomplished with hardware or software means or a com- 
bination thereof, as is well known in the art. 

The flow identifying information contained in the packet 
header (sometimes called the "flow label" in the art) is 
hashed in order to reduce the huge range of packet header 

35 values into a single compact, easily manipulated field having 
a far smaller range of values. Hashing avoids the per-flow 
lookup, set up, and tear down overhead of prior-art systems. 
For example, this embodiment of the present invention does 
not maintain flow table entries for each and every flow. 

40 Rather, of the 2 160 possible flows uniquely identified by the 
first five 32-bit words in the IP packet header, the hash 
function limits the flow table to just 2" entries, substantially 
less than the unhashed situation. In other words, flow table 
335 consists of 2 n entries, where n=the number of bits in the 

45 output of the hash function above. In one embodiment, a 19 
bit hash index is used, supporting 512K entries. This pro- 
vides the advantage of needing fewer bits to identify a table 
entry corresponding to a particular flow, thus reducing the 
overhead and resource cost of this particular embodiment 

50 over the prior art. 

For smaller network applications, such as those on an 
enterprise scale, a flow table of 2 1<s or 64 K entries (n-16) 
appears to be sufficient, implying a hash function yielding 16 
bits. For larger scale internetworking, such as Internet Ser- 

55 vice Provider (ISP) backbones, flow table of at least 256 K 
to 1 M entries (218 to 220 entries) should be provided to 
accommodate the large number of flows seen in such appli- 
cations. 

Although an IP packet is described, those skilled in the 
60 will art realize that datagrams or packets other than IP 
packets can be used. Other datagram formats are accommo- 
dated simply by determining the type of datagram received 
by methods well-known in the art, such as reading identi- 
fying data from the header, and applying the hash function 
65 described above to the appropriate data fields. Accordingly, 
the invention is not limited to any particular type of data- 
gram. 
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The hashing embodiment of the present invention only 
approximates a flow-specific management scheme, because 
multiple flows could potentially be mapped to the same flow 
table hash bucket. Such a situation is referred to as a hash 
collision. 

Hash collisions, in isolation, have little or no effect on the 
flows involved. Hash collisions are made low probability by 
providing a large number of flow entries and because NAFs 
are expected to be a small percentage of flows. Each 
incoming packet is tested against the current count of stored 
buffers in the hash bucket. It will be marked, dropped, or 
enqueued accordingly, as expected. The fact that the count 
stored in the flow table bucket may be in error will have no 
real effect, because at worst a NAF packet, such as a 
high-speed User Datagram Protocol (UDP) flow with no rate 
adaptation on packet drop, will be enqueued a few times 
when it should have been limited. In the short term, this will 
result in some inefficiency, but (since the buffer count will be 
incremented twice as often), the flows will both soon be 
limited. 

Persistent hash collisions, on the other hand, as when a 
NAF and a fragile flow are mapped to the same bucket, will 
result in greater inefficiencies and unfair bandwidth alloca- 
tions. Though the probability of such an event is low, due to 
the relative rarity of both NAFs and hash collisions 
themselves, such a situation is undesirable. Persistent hash 
collisions between flows that happen to hash into the same 
bucket can be avoided by periodically changing the hash 
seed on the hash function 332 above (referring to FIG. 3) 
used to compute the hash index from the packet header. In 
a hardware implementation of the present invention, this 
change of hash seed may be under software control. Periodic 
change of hash seed is used because there is no way to 
determine whether a hash collision is occurring. Detecting 
collisions would require keeping explicit flow state at a 
significant implementation, and likely performance, cost. 

The hash index is stored in the packet descriptor field in 
the transmit (output) queue for later use in transmitting the 
packet (FIG. 4). The packet descriptor field also contains the 
packet length, rewrite information, and a pointer to the start 
of the data buffer for the packet. 

The ability to revise the hash seed is appropriate in any 
case to guard against potentially anomalous behavior at a 
particular installation. By storing the original hash index in 
the packet descriptor, 420, for each packet, the stored buffer 
count is updated correctly even if the hash seed was changed 
between packet reception and transmission. 

The above hashing scheme is one embodiment of map- 
ping a packet in a flow to an index identifying the associated 
flow table entry. This mapping can be realized by a number 
of other methods. As one alternative, the extracted header 
fields can be concatenated to form a key that is input to a 
content-addressable memory (CAM). If the key matches an 
entry in the CAM, the CAM returns the address of the first 
matching entry. This address can then be used directly as the 
index to identify the flow table entry. Alternatively, the value 
returned by the CAM can be used to address a second 
memory that provides the index to identify the flow table 
entry. In such an embodiment, if a key does not match in the 
CAM, a matching entry is allocated and initialized. A default 
entry or set of entries may be selected if the CAM is full. 
When a flow table entry is reduced to zero buffer usage, the 
associated CAM entry can be recorded as free, making it 
available for allocation to a new flow key. The matching of 
the key to entry can be an exact match using a binary CAM 
or partial match, using a ternary CAM. 

As a further alternative embodiment, the extracted header 
data can be concatenated to form a key that is input to a 
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cache, structured as set of N sets of k entries each. The key 
is hashed to one of the sets and then matched to one of the 
k entries in the set. The address of the matching entry can be 
used as the index to the flow table entry or as an address into 

5 a second memory whose addressed entry then provides the 
index to the flow table entry. As with the CAM embodiment 
above, a new cache entry is allocated and initialized when 
the mapping of the packet in the cache fails, and an entry in 
the cache is deallocated when the corresponding flow table 

10 entry goes to zero buffer usage. 

Although several mapping exemplars are described, one 
skilled in the art will realize that mappings other than the 
present examples can be used. Accordingly, this invention is 
not limited to any particular type of extracted header data to 

15 flow table entry mapping. 

Dynamic Buffer Limit (DBL) Computation 

Meanwhile, also in a substantially parallel process, an 
index pointer into the pre-existing dynamic buffer limit 
(DBL) table 310 is computed, step 392, from the router/ 

20 switch state parameters 345. This computation is according 
to the function: 

DBLijindcx=inaxQucucLcn*(flo\vsInOucuc)+currcntQucucLen 

where maxQueueLen 514 is a fixed router parameter limit- 
25 ing the length of any one queue, flowsInQueue is a count of 

the number of different flows currently in the output queue 

for the port, currentQueueLen is a count of the current 

number of buffer elements in the queue. 

In one embodiment of the present invention, DBL values 
30 are stored in a table for rapid lookup. FIG. 5 describes the 

process whereby the table is created and updated. For a 

given router/switch state 345, 

DBL=(maxQueueLen/flo wsInQueue)* (Kx maxQueu&Len/cuir ent- 
QueueLen) 

35 

where K is a tuning parameter that adjusts DBL according 
to the instantaneously available queue space. This latter 
adjustment uses the factor maxQueueLen/cirrentQueueLen 
times tuning factor K to scale DBL, since maxQueueLen is 

40 always greater than or equal to currentQueueLen. Parameter 
maxQueueLen is an element of router/switch state 345. 
"Buffer elements" refer to the minimum unit of measure- 
ment of data storage in the router/switch and is typically a 
unit larger than a byte. Units of packets are not recom- 

45 mended as packet size can vary enormously. Likewise, units 
of bytes are not recommended because too many bits would 
be required in the flow table to keep the count field. Testing 
has shown that units ("cells") of 64 byte groups reduce the 
bits required by six places and yet provide a more accurate 

so count (over units of packets) with minimal inefficiencies. 
Persons of ordinary skill in the art will of course recognize 
that other units are possible. As elsewhere, reference to 
counting units of "buffers" or "cells" is not intended to limit 
the present invention to any one particular unit size. 

55 If a table of maxQueueLen* maxQueueLen is too large, 
the values of currentQueueLen and flowsInQueue can be 
divided by some constant, such as 2, 4, or another power of 
2, so that the table is large enough. With a full-sized table, 
this table lookup is as good as computing it on the spot, but 

60 just uses memory rather than random hardware logic or 
additional software instructions. As the table is reduced in 
size (by picking a larger constant divisor), the accuracy of 
the limit provided by DBL is reduced. However, similar 
shortcuts may be desired when fully computing DBL on 

65 each packet, because full multiplies and divides can be 
approximated to increase the speed and/or simplify the 
logic. 
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Computing DBL without considering available queue In an alternate embodiment, a credit field is maintained in 

space would be simpler but might excessively restrict bursts the flow table entry for each flow. The credit field is used to 

of packets in a given flow when there are only one or two help decide whether a packet is enqueued or tagged, 

packet flows in the queue, i.e., flowsInQueue is small. In a further alternate embodiment, the decision to take any 

Computing DBL without considering flowsInQueue would 5 further action other than enqueuing is made based on a 

require DBL to ramp back too aggressively as the queue fills, probability function. For example, a pseudo-random number 

given that it would not be able to distinguish whether it is a (PRN) can be generated and compared to a set threshold 

small number of large flows or a large number of small flows value. If the PRN is greater than the threshold, the packet is 

that is causing the congestion. enqueued without further processing or delay. 

User-specified parameters dblMin and dblMax, referring Enqueuing 

to steps 520 and 530, are provided to constrain DBL to the When enqueuing, referring to FIG. 7, the buffer count 

range of values between dblMin and dblMax, independent stored in the indexed flow table entry is incremented by the 

of the above-computed value 510. The parameter dblMin packet's buffer requirement, which is simply the packet size 

protects fragile flows from dropping. A fragile flow is a flow 240 (FIG. 2), read from header 710 and converted into buffer 

that sends at a very low rate, such as VoIP or a Telnet session cell units 720 (simply referred to as "buffers") as discussed 

or the flow of TCP acknowledgment packets in the opposite 35 above. If the buffer count is zero initially 723, the router 

direction to the data flow. Such a flow sends less than dblMin state parameter flowsInQueue is incremented by one, 726, 

packets in the time required to empty the maximum length denoting a new flow. 

queue. For example, with dblMin=2, a queue length of 2,048 A credit field may also be maintained in the flow table for 

entries, a 1 Gigabit per second (Gbps) port and assuming an each indexed flow table entry. In such an alternate 

average packet size of 300 bytes, a fragile flow would be any 20 embodiment, the credit value is incremented 740 on enqueu- 

flow having a data rate of less than 600 Kilobits per second ing; on marking or dropping, the credit value is decremented 

(Kbps). 680 (See FIG. 6). Once a flow exhausts its credits or, 

A dblMin value of 2 appears to be desirable for fragile alternately, reaches a minimum threshold credit level), a 

flows of the type discussed in D. Lin and R. Morris, separate NAF limit is enforced on that flow table entry, 

Dynamics of Early Detection, SIGCOMM '97, Cannes, 25 substantially less than and replacing the DBL. Any new 

France (Lin & Morris). Parameter dblMax simply prevents packet exceeding this NAF limit will be dropped. The 

DBL from taking on unnecessarily large values; it should be credits give a flow several packets to respond to the initial 

substantially smaller than maxQueueLen. This prevents the packet drop before the flow is classified a NAF. For instance, 

queue from becoming over-committed to a single flow a TCP flow over its dynamic buffer limit incurring a prob- 

during a lull in other traffic. 30 ability of drop of 0,1 could send roughly 30 packets after the 

The process of loading DBL table 310 is a multi-variable first drop before exhausting its credits and being classified a 

loop shown in FIG. 5. Since every queue is limited in length NAF. The NAF limit can be computed as a function of DBL, 

to maxQueueLen 514 cells, in its most congested state, a such as DBL/4, bounded below by dblMin. Thus, some 

queue can have up to maxQueueLen flows, given one cell traffic from a NAF (e.g., small packets) will still be able to 

per flow. Accordingly, flowsInQueue ranges from 1 to max- 35 get through. 

QueueLen and currentQueueLen ranges from 1 to max- As an alternative to imposing a separate NAF limit, the 

QueueLen. Thus, DBL table 310 consists, in worst case, of DBL for the flow can be reduced. 

a maxQueueLen by maxQueueLen array, indexed by flows- A NAF must stay under the NAF limit for several suc- 

InQueue and currentQueueLen. cessful queuing operations to build up enough credits so that 

Loading DBL table 310 begins by initiating for-next loop 40 it will be reclassified as adapting (that is, a non-NAF) before 

563 for variable flowsInQueue 503 and for-next loop 567 for it will be allowed more queue space. Thus, an unrepentant 

variable currentQueueLen 507. For each instance of NAF will be held at the NAF buffer limit; no new packets 

(flowsInQueue, currentQueueLen), a DBL is computed, 510. will be enqueued until some are read out. In order to 

Each DBL is tested against dblMin or dblMax as described maintain fair resource allocation even to NAFs, the NAF 

above. The resulting value of DBL, limited to dblMin or 45 limit should be set to give about the same throughput 

dblMax as appropriate, is written 540 to DBL table 310 at bandwidth as a normal, robust (i.e., adaptive) flow, 

the location indexed by (flowsInQueue, currentQueueLen). Note that the basic credit and NAF ramp -back scheme is 

Variable currentQueueLen is incremented 557 and inner desirable to ensure both NAFs and "good" flows receive fair 

loop 567 repeats until variable currentQueueLen** allocations of bandwidth. Without a ramp-back to the NAF 

maxQueueLen, At that time, variable flowsInQueue is incre- 50 limit, a NAF would end up with a number of packets slightly 

mented 553 and table filling proceeds on outer loop 563 until less than DBL buffered, while robust flows would back off 

the entire table is filled. substantially on each drop to the point where they would 

Alternatively, the DBL value can be computed on the fly have an average number of packets substantially below the 

for each packet. FIG. 8 illustrates this process. Here, the soft limit (much less than DBL) buffered. Because band- 

DBL computation 510, with dblMin and dblMax tests 520 55 width provided to a flow going to a congested port is 

and 530, proceeds as above. However, maxQueueLen 514, proportional to the number of buffers enqueued (for FIFO 

flowsInQueue 810, and currentQueueLen 820 are all read queuing, typical in internetworking systems), this situation 

directly from router/switch state 345. results in far more bandwidth allocated to NAFs than to 

Enqueue/Tag Decision well-behaved flows. While it may be infeasible to provide 

Once the DBL appropriate to the received packet is 60 equal bandwidth to the different types of flows, because 

determined, the current (p re-enqueuing) stored buffer count most vary in their reactions to drop depending on round -trip 

334 for the flow is compared to DBL, 320 in FIG. 3, This time and window limits, simulation results indicate that the 

count is retrieved from the indexed flow table entry, present invention DBL scheme avoids the gross imbalances 

described above. If the buffer count is greater than DBL, the that might otherwise occur, 

packet is tagged for further processing 340, detailed below. 65 Tagging 

Otherwise, whenever the buffer count is less than or equal to If the packet is tagged 340, in one embodiment of the 

DBL, the packet is enqueued 330. present invention it is dropped, i.e., not enqueued and 



I 
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therefore not later transmitted to the destination address. In 
an alternate embodiment, referring to FIG. 6, tagged packets 
are not dropped immediately, but tested, 610, first. If the 
packet qualifies for marking and enqueuing, a mark bit is set 
620 in the packet header (such as in the TOS field 210 or 
options field 230, FIG. 2) and the packet is enqueued 
normally as shown in FIG. 7 and described above. The mark 
bit tells subsequent recipients of the packet, be they other 
routers or switches in the network or the packet's ultimate 
destination, that the packet passed through congestion. Such 
a bit setting marking method is similar to using the ECN bit 
in the proposed IP version 6 (IPv6). 

Alternatively, a backcbannel message can be sent to the 
source address to indicate that congestion is beginning to 
occur and that the source should consider ramping back its 
flow. Backchannel messages may be sent using the well- 
known Internet Control Message Protocol (ICMP), for 
example. 

If, however, the packet does not qualify for marking in 
step 610, it is dropped 650. 

In a further alternate embodiment, whether a tagged 
packet is subsequently dropped is determined 
probabilisticly, i.e., by random selection. 

In a still farther embodiment, a tagged packet may be 
forwarded to another software process for additional action 
instead of being dropped or marked/enqueued. 
Transmission of Enqueued Packets 

Of course, all routers and switches must also transmit the 
data they receive. Referring to FIG. 1, data is read out from 
the queue or queues 40 assigned to each output port 70 in a 
manner well-known in the art. FIG. 4 shows the steps within 
the transmission process according to the present invention 
and more particularly described below. 

The packet is transmitted into the network by the switch/ 
router at step 410. Next, packet descriptor 420 is read from 
the transmit (output) queue. The index, stored in the packet 
descriptor, is read 430 to enable access to flow table 335. 

With the hash embodiment of the mapping to a flow 
index, storing the index allows the hash seed or function to 
be changed without producing an incorrect flow entry count. 
With the embodiment using a CAM or a cache, storing the 
index allows the entries in the CAM or cache to be moved 
without producing an incorrect flow entry count. 

As an alternative embodiment, the index can be 
re-determined from the packet header on transmission rather 
than storing the index in the transmit queue. To avoid 
incorrect flow entry access in this embodiment, a short 
generation number can be stored in the transmit queue 
associated with the packet indicating the version of the 
mapping used by this packet and this version of the mapping 
is then queued on transmit to regenerate the same index. In 
particular, in the case of hashing, the generation number can 
indicate the previous hash seed that was used. As a simpli- 
fied alternative, the mapping can simply be required to 
remain unchanged after initialization until there are no 
packets enqueued in the switch. 

The stored buffer count field stored in the flow table entry 
is read 440 and decremented 450 by the appropriate number 
of buffers representing the packet removed for transmission. 
Recall again that the count of buffers stored in the flow table 
and the number of buffers in the enqueued packet are 
expressed in the same units, be they bytes or groups of bytes. 

If the stored buffer count field reaches zero, then no more 
packets from the flow remain in queue. Accordingly, test 460 
checks the post-decrement count and decrements 470 the 
router state variable flowsInQueue if count is zero. The 
process loops, 9988, as long as there are packets in queue for 
transmit. 
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Hard and Soft Limiting Alternate Embodiment 

A further alternative embodiment implements soft and 
hard dynamic buffer limits, referring to FIG. 9. Comparison 
905 determine the subsequent steps. If the stored buffer 

5 count in the flow table entry exceeds a soft limit value and 
is less than a hard limit (greater than dblMin but less than 
DBL) 920, the packet is tagged 340 as above. However, it is 
then dropped or marked based on random selection, i.e., 
qualification step 610, FIG. 6, is based on a random selection 

10 of mark or drop. Such probabilistic drop computations are 
known in the art and commonly employed in RED-type 
schemes. If the stored buffer count exceeds the hard limit 
(DBL) 930, the incoming packet is dropped 650 and the 
credit field for the flow in flow table 335 is decremented. For 

15 simplicity, the soft limit may be set to a fraction of the hard 
limit so that only the hard limit is non-trivially computed or 
looked up. Of course, if count is less than or equal to the soft 
limit, the packet is enqueued 330 as above. 

Once the packet is either enqueued or marked, the system 

20 loops back to wait for and process the next packet received, 
9999. In a substantially parallel process, enqueued packets 
are transmitted out into the network via output ports 70 
(FIG. 1), as described below. 
DBL Computation Alternate Embodiments 

25 As discussed above, the DBL values can be computed 
either a priori to the receipt of a packet (referring to FIG. 5) 
or dynamically on-the-fly for each packet received (FIG. 8). 
In either case, the same formula 510, given above, is used to 
complete the DBL for a given queue at any instant in time. 

30 In the case of the p re-computed table, a multi-dimensional 
array is constructed, indexed by independent variables 
flowsInQueue 810 and currentQueueLen 820 and containing 
DBL values as a function of maxQueueLen 514 and the 
noted independent variables. The size of this DBL lookup 

35 table is therefore determined by the range of values for 
flowsInQueue 512 and currentQueueLen 516, as set by 
system limitations on the maximum number of recognized 
flows allowed in any one queue and maxQueueLen, respec- 
tively. 

40 In an alternate embodiment, DBL is computed factoring 
in the round-trip time (RTT) of transmission of a drop or 
mark notice and receipt of an adapted flow. This is done 
because a robust flow with a long RTT needs additional 
buffering at the congestion point to allow time for the source 

45 to react to that congestion. In an internet service provider 
(ISP) backbone environment, for example, this consider- 
ation may be significant, given the large amount of buffering 
required overall (due to the large number of flows) and the 
wide variation in RTT per flow. The problem to be solved 

50 then is to identify flows with long RTT and to adapt DBL 
appropriately. 

One approach is to generate an estimate of RTT based on 
mapping the packet's source address (SA) and destination 
address (DA) to a source autonomous system (AS) and a 

55 destination AS, respectively, where the AS is a label refer- 
ring to the group of routers and switches operating under 
common control. Autonomous system grouping is com- 
monly used in the art to refer to specific wide area or local 
area networks (WANs or LANs). The estimate of RTT, 

60 represented by a round-trip factor (RTF) of from 1 to n bits, 
is then incorporated into the DBL computation 510 as 
follows: 

DBL=(RTF+ 1 )x (maxQucucLen/flowsInOueue)x (KxmaxQu eueLen/ 
cunetttQueueLen) 

65 

Here, a two-bit RTF represents a coarse classification of 
RTT into small, medium, large, and very large RTTs. One of 
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ordinary skill in the art will appreciate that any number of sions discussed above. The improvement in performance 

RTF bits could be used to provide a finer or coarser comes from the fact that, with equal transmission scheduling 

classification of RTT. For example, a one -bit RTF could be among the active queues on a given port, a separate queue 

used to simply discriminate local (intra-AS) flows from provides better service for flows mapped into it that present 

remote (inter-AS) flows so that DBL is doubled when the 5 * smaller amount of data to the queue. Thus, a queue 

flow is from a remote source outside the immediate AS. containing a class of service characterized by small number 

Even the one-bit RTF implementation carries significant of sma11 flows will experience less delay in scheduled 

potential utility in that it provides a mechanism to avoid transmission For example, with two queues and equal 

drops on potentially more costly or time-critical remote scheduling, if he high priority class of service queue has 

„ r r ' J one fourth of the traffic, a high priority now should expe- 

°^ S * ^ . L j- i i c • u rience half the delay of a normal priority flow. This is so 

In a further alternate embodiment a class of service/type because each / hatf ^ ^ m5B ^ 8skn bandwidth) 

of service-specific queue and scheduling mechanism pro- but ^ hjgh priority ^ G hag 0Qly Qne quarter of ±t 

vides enhanced immunity from flow latency disruptions due In generalj if a queue is a Uocated X percent of the port 

to NAFs. Flows of different kinds of data are directed to bandwidth and Y percent of the traffic, the delay should be 

different, separate queues and receive forwarding transmis- is reduced by X/Y percent relative to a single queue scheme. 

sion priority concomitant with their content. Differentiation jh e amount of traffic to the port and the percentage of output 

includes class-specific modifications to the DBL so that port bandwidth allocated to each queue determine the delay 

higher priority flows receive more buffer space. In this reduction available to a given class of service. 

embodiment, DBL is computed taking the precedence or Although TOS and class of service are described, those 

type of service of a flow into account. As in the RTT 20 skilled in the will art realize that methods of determining 

alternative above, DBL is increased for higher precedence precedence are not limited to the TOS field in an IP packet 

flows in order to prevent drops. One of ordinary skill in the header. For example, precedence may be determined by 

art will appreciate that a variety of classification schemes to reference solely to the source address of a packet. 

derive a "TOS factor*' analogous to RTF above are possible. Accordingly, the invention is not limited to any particular 

Accordingly, the present invention is not limited to one 25 method of precedence determination. 

particular method of mapping type of service data, including Conclusion 

but not limited to packet source address, destination address, While particular embodiments of the present invention 
or TOS field values, to a TOS factor for DBL value scaling. have been shown and described, it will be obvious to those 
Queuing Decision Alternate Embodiments skilled in the art that changes and modifications may be 
In an alternate embodiment to the invention described 30 made without departing from this invention in its broader 
above, the decision to enqueue a packet is further condi- aspects and, therefore, the appended claims are to enco ra- 
tioned on either the state of buffer and queue reserves or the pass within their scope all such changes and modifications as 
class of service of the input packet. Class of service is fall within the true spirit and scope of this invention, 
sometimes referred to as type of service (TOS), reflecting We claim: 

the eponymous field in the IP packet header. In a further 35 1. A method of buffer management in a communication 

alternate embodiment, FIG. 10, both reserve state and TOS system comprising: 

are factored into the queuing decision. Both alternatives rely parsing an incoming packet comprising one or more fields 

on identifying the class of service in a given flow and its to generate one or more extracted fields; 

precedence (priority) relative to other flows. transforming said one or more extracted fields to generate 

In the reserve alternative, a device- wide reserve pool of 40 an mc j ex . 

buffer cells 1030 is maintained for each precedence level feadin & fl ' QW uMe entfy identified by said mdex . 

defined by TOS A shared buffer pool is assumed for the tin a d ic 5uffer limit (DBL) vahie repreS ent- 

device. (Recall from above that a buffer cell is the minimum . , Jf * l n * j * 

T ' , v X, „ . " . . . ing an upper bound of memory space to be allocated to 

unit of buffer space allocation. It may be a byte or a group ^ packet* 

of bytes.) Router/switch state parameter free cells 1002 is 45 p . \ _ A 

compared to reserve 1030 for the appropriate TOS, 1005. A comparing said DBL value wilt i the flow table entry; and 

packet is tagged 340, rather than tested against DBL 320, if ^g an action on the packet based on said comparing, 

the number of free cells on its arrival is less than the total 2 - ™° method . of clam \ 1 em . said transforming 

reserve set aside for packets of higher precedence level. A comprises grating an index by hashing one or more of 

much more limited scheme is used currently in the art to 50 said extracted fields. . . 

ensure that control packets (e.g., STP packets for layer 2 3 - ^ method of claim 1 wherein said transforming 

functions) are always transmitted and never dropped. In comprises generating an index with a content-addressable 

addition, a reserve of output queue space is also maintained memo i7 (CAM) f° T p ' J , . 

for each precedence level, Hie process for testing free cells 4 - mcthod of clai ™ 1 said transforming 

against queue reserves is the same: a packet is tagged 340 if 55 comprises generating an index with a cache lookup, 

the remaining space in the queue is less than the reserve set 5 - ™* method of claim 1 whcrein said computing com- 

aside for higher precedence packets. Such a scheme has the pnses: 

advantage of enabling differentiated handling of packets of loadi *g a DBL lookup table with a plurality of DBL 

different precedence levels at a later processing point by values, each DBL value singly corresponding to a 

preventing packet drop due to a lack of either buffers or 60 unic l ue combination of values of system state registers; 

queue space. 00 the receipt of the incoming packet, transforming the 

In the type of service queuing embodiment (not shown), values of the state registers into an index to said DBL 

a separate queue is provided for each class of service lookup table; and 

assigned to an output port, rather than a single queue or reading said DBL value corresponding to said index from 

multiple undifferentiated queues per port. Each queue on a 65 said DBL lookup table. 

given port still uses the same flow table and the same 6. The method of claim 5 wherein said computing further 

per-queue, per-packet computations and enqueuing deci- comprises scaling said DBL value by a factor. 
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7. The method of claim 1 wherein said computing com- 
prises calculating said DBL value with a scaling factor 
reflecting a classification of the incoming packet. 

8. The method of claim 1 wherein said taking an action is 
based on a probability function. 5 

9. The method of claim 1 wherein said action is selected 
from the group consisting of: 

marking the packet if a stored buffer count within said 
flow table entry is greater than said DBL value, wherein 
said marking further comprises enqueuing the packet 10 
and said enqueuing further comprises: 
calculating a buffer requirement for the incoming 
packet; 

incrementing said stored buffer count by said buffer 
requirement; and 15 

storing an incremented stored buffer count into a stored 
buffer count field; 
enqueuing the packet if said stored buffer count is less 

than said DBL value; 
dropping the packet if said stored buffer count is greater 20 

than said DBL value; 
dropping the packet according to a probability function 

wherein the probability of said dropping is less than 

unity; and 

forwarding the packet to a process for further processing; 25 
and wherein said taking an action further comprises 
updating said flow table entry. 

10. The method of claim 9 wherein said action is based on 
a probability function. 

11. The method of claim 1 wherein said action comprises 30 
reading a credit field in said flow table entry and wherein 
said action is selected from the group consisting of: 

marking the packet based in part on said credit field if a 
stored buffer count within said flow table entry is 
greater than said DBL value, wherein said marking 35 
further comprises enqueuing the packet and said 
enqueuing further comprises: 

calculating a buffer requirement for the incoming 
packet; 

incrementing said stored buffer count by said buffer 40 
requirement; and 

storing an incremented stored buffer count into a stored 
buffer count field; 
marking the packet based in part on said credit field and 

according to a probability function if a stored buffer 45 

count within said flow table entry is greater than said 

DBL value, wherein said marking further comprises 

enqueuing the packet; 
enqueuing the packet based in part on said credit field if ^ 

said stored buffer count is less than said DBL value; 
dropping the packet based in part on said credit field if 

said stored buffer count is greater than said DBL value; 
dropping the packet based in part on said credit field and 

according to a probability function wherein the prob- S5 

ability of said dropping is less than unity; 
forwarding the packet to a process for further processing 

based in part on said credit field; 
and wherein said taking an action further comprises 

updating said flow table entry. 60 

12. The method of claim 11 wherein said updating com- 
prises: 

decrementing said credit field and testing said credit field 
after said decrementing against a minimum threshold if 
said action is marking; or incrementing said credit field. 65 

13. The method of claim 12 wherein said DBL value is 
reduced prior to said testing. 



14. The method of claim 11 wherein said action is based 
on a probability function. 

15. A communication system comprising a network 
device, a plurality of input flows, and a plurality of output 
flows interoperably connected to each other, wherein said 
network device comprises: 

buffer manager circuitry receiving said plurality of input 
flows, said input flows each comprising a plurality of 
packets, said packets each comprising a plurality of 
fields; 

a buffer pool coupled to said buffer manager circuitry; 

a plurality of output queues coupled to said buffer pool; 
port scheduler circuitry coupled to said plurality of 
output queues and transmitting said plurality of output 
flows; and 

a controller coupled to said buffer manager circuitry, said 
buffer pool, said plurality of output queues, and said 
port scheduler circuitry, wherein said controller: 
maps each of said input flows into a flow table; 
computes a dynamic buffer limit (DBL) value repre- 
senting an upper bound on a number of buffers to be 
allocated to each packet; 
compares a stored buffer count to said DBL value; and 
acts on each packet based on said comparison. 

16. The communication system of claim 15 wherein said 
controller maps each of said input flows by hashing one or 
more of said fields. 

17. The communication system of claim 15 wherein said 
controller maps each of said input flows by generating an 
index with a content-addressable memory (CAM) lookup. 

18. The communication system of claim 15 wherein said 
controller maps each of said input flows by generating an 
index with a cache lookup. 

19. The communication system of claim 15 wherein said 
controller computes said dynamic buffer limit value by: 

loading a DBL lookup table with a plurality of DBL 
values, each DBL value singly corresponding to a 
unique combination of values of system state registers; 

on the receipt of an incoming packet, transforming the 
values of the state registers into an index to said DBL 
lookup table; and 

reading said DBL value corresponding to said index from 
said DBL lookup table. 

20. The communication system of claim 19 wherein said 
reading further comprises scaling said DBL value by a 
factor. 

21. The communication system of claim 15 wherein said 
controller computes said DBL value with a scaling factor 
reflecting a classification of an incoming packet. 

22. The communication system of claim 15 wherein said 
controller acts based on a probability function. 

23. The communication system of claim 15 wherein said 
acts are selected from the group consisting of: 

marking the packet if a stored buffer count within said 
flow table entry is greater than said DBL value, wherein 
said marking further comprises enqueuing the packet 
and said enqueuing further comprises: 
calculating a buffer requirement for the incoming 
packet; 

incrementing said stored buffer count by said buffer 

requirement; and 
storing an incremented stored buffer count into a stored 
buffer count field; 
enqueuing the packet if said stored buffer count is less 

than said DBL value; 
dropping the packet if said stored buffer count is greater 
than said DBL value; 
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dropping the packet according to a probability function 
wherein the probability of said dropping is less than 
unity; and 

forwarding the packet to a process for further processing; 
and wherein said acts further comprise updating sad flow 5 
table entry. 

24. The communication system of claim 23 wherein said 
acts are based on a probability function. 

25. The communication system of claim 15 wherein said 
acts comprise reading a credit field in said flow table entry J 0 
and wherein said acts are selected from the group consisting 
of: 

marking the packet based in part on said credit field if a 
stored buffer count within said flow table entry is 
greater than said DBL value, wherein said marking 15 
further comprises enqueuing the packet and said 
enqueuing further comprises: 

calculating a buffer requirement for the incoming 
packet; 

incrementing said stored buffer count by said buffer 20 
requirement; and 

storing an incremented stored buffer count into a stored 
buffer count field; 
marking the packet based in part on said credit field and 

according to a probability function if a stored buffer 25 

count within said flow table entry is greater than said 

DBL value, wherein said marking further comprises 

enqueuing the packet; 
enqueuing the packet based in part on said credit field if ^ 

said stored buffer count is less than said DBL value; 
dropping the packet based in part on said credit field if 

said stored buffer count is greater than said DBL value; 
dropping the packet based in part on said credit field and 

according to a probability function wherein the prob- 35 

ability of said dropping is less than unity; 
forwarding the packet to a process for further processing 

based in part on said credit field; 
and wherein said acts further comprise updating said flow 

table entry. 40 

26. The communication system of claim 25 wherein said 
updating comprises: 

decrementing said credit field and testing said credit field 
after said decrementing against a minimum threshold 
when one of said acts is one of marking the packet 45 
based in part on said credit field, and marking the 
packet based in part on said credit field and according 
to a probability function; or incrementing said credit 
field. 

27. The communication system of claim 26 wherein said so 
DBL value is reduced prior to said testing. 

28. The communication system of claim 25 wherein said 
acts are based on a probability function. 

29. A communication system comprising a network 
device, a plurality of input devices, a plurality of buffers, and S5 
a plurality of output queues interoperably connected to each 
other; wherein said network device further comprises com- 
puter instructions for: 

parsing an incoming packet comprising one or more fields 
to generate one or more extracted fields; 6Q 

transforming said one or more extracted fields to generate 
an index; 

reading a flow table entry identified by said index; 

computing a dynamic buffer limit (DBL) value represent- 
ing an upper bound on a number of buffers to be 65 
allocated to the packet; 

comparing said DBL value with the flow table entry; and 
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taking an action on the packet based on said comparing. 

30. The communication system of claim 29 wherein said 
transforming comprises generating an index by hashing one 
or more of said extracted fields. 

31. The communications system of claim 29 wherein said 
transforming comprises generating an index with a content- 
addressable memory (CAM) lookup. 

32. The communications system of claim 29 wherein said 
transforming comprises generating an index with a cache 
lookup. 

33. The communication system of claim 29 wherein said 
computing comprises: 

loading a DBL lookup table with a plurality of DBL 
values, each DBL value singly corresponding to a 
unique combination of values of system state registers; 

on the receipt of the incoming packet, transforming the 
values of the state registers into an index to said DBL 
lookup table; and 

reading said DBL value corresponding to said index from 
said DBL looku table. 

34. The communication system of claim 33 wherein said 
computing further comprises scaling said DBL value by a 
factor. 

35. The communication system of claim 29 wherein said 
computing comprises calculating said DBL value with a 
scaling factor reflecting a classification of the incoming 
packet. 

36. The communication system of claim 29 wherein said 
taking an action is based on a probability function. 

37. The communication system of claim 29 wherein said 
action is selected from the group consisting of: 

marking the packet if a stored buffer count within said 
flow table entry is greater than said DBL value, wherein 
said marking further comprises enqueuing the packet 
and said enqueuing further comprises: 
calculating a buffer requirement for the incoming 
packet; 

incrementing said stored buffer count by said buffer 
requirement; and 

storing an incremented stored buffer count into a stored 
buffer count field; 
enqueuing the packet if said stored buffer count is less 

than said DBL value; 
dropping the packet if said stored buffer count is greater 

than said DBL value; 
dropping the packet according to a probability function 

wherein the probability of said dropping is less than 

unity; and 

forwarding the packet to a process for further processing; 
and wherein said taking an action further comprises 
updating said flow table entry. 

38. The communication system of claim 37 wherein said 
action is based on a probability function. 

39. The communication system of claim 29 wherein said 
action comprises reading a credit field in said flow table 
entry and wherein said action is selected from the group 
consisting of: 

marking the packet based in part on said credit field if a 
stored buffer count within said flow table entry is 
greater than said DBL value, wherein said marking 
further comprises enqueuing the packet and said 
enqueuing further comprises: 

calculating a buffer requirement for the incoming 
packet; 

incrementing said stored buffer count by said buffer 

requirement; and 
storing an incremented stored buffer count into an 

stored buffer count field; 
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marking the packet based in part on said credit field and 
according to a probability function if a stored buffer 
count within said flow table entry is greater than said 
DBL value, wherein said marking further comprises 
enqueuing the packet; 

enqueuing the packet based in part on said credit field if 
said stored buffer count is less than said DBL value; 

dropping the packet based in part on said credit field if 
said stored buffer count is greater than said DBL value; 

dropping the packet based in part on said credit field and 
according to a probability function wherein the prob- 
ability of said dropping is less than unity; 

forwarding the packet to a process for further processing 
based in part on said credit field; 

and wherein said taking an action further comprises 
updating said flow table entry. 

40. The communication system of claim 39 wherein said 
updating comprises: 

decrementing said credit field and testing said credit field 
after said decrementing against a minimum threshold if 
said action is marking; or 

incrementing said credit field. 

41. The communication system of claim 40 wherein said 
DBL value is reduced prior to said testing. 

42. The communication system of claim 39 wherein said 
action is based on a probability function. 

43. A computer readable storage medium comprising 
computer inspections for: 

parsing an incoming packet comprising one or more fields 
to generate one or more extracted fields; 

transforming said one or more extracted fields to generate 
an index; 

reading a flow table entry identified by said index; 

computing a dynamic buffer limit (DBL) value represent- 
ing an upper bound on a number of buffers to be 
allocated to the packet; 

comparing said DBL value with the flow table entry; and 

taking an action on the packet based on said comparing. 

44. The computer readable storage medium of claim 43 
wherein said transforming comprises generating an index by 
hashing one or more of said extracted fields. 

45. The method of claim 43 wherein said transforming 
comprises generating an index with a content-addressable 
memory (CAM) lookup. 

46. The method of claim 43 wherein said transforming 
comprises generating an index with a cache lookup. 

47. The computer readable storage medium of claim 43 
wherein said computing comprises: 

loading a DBL lookup table with a plurality of DBL 
values, each DBL value singly corresponding to a 
unique combination of values of system state registers; 

on the receipt of the incoming packet, transforming the 
values of the state registers into an index to said DBL 
lookup table; and 

reading said DBL value corresponding to said index from 
said DBL lookup table. 

48. The computer readable storage medium of claim 47 
wherein said computing further comprises scaling said DBL 
value by a factor. 

49. The computer readable storage medium of claim 43 
wherein said computing comprises calculating said DBL 
value with a scaling factor reflecting a classification of the 
incoming packet. 

50. The computer readable storage medium of claim 43 
wherein said taking an action is based on a probability 
function. 

51. The computer readable storage medium of claim 43 
wherein said action is selected from the group consisting of: 
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marking the packet if a stored buffer count within said 
flow table entry is greater than said DBL value, wherein 
said marking further comprises enqueuing the packet 
and said enqueuing further comprises: 
5 calculating a buffer requirement for the incoming 
packet; 

incrementing said stored buffer count by said buffer 

requirement; and 
storing an incremented stored buffer count into a stored 
10 buffer count field; 

enqueuing the packet if said stored buffer count is less 

than said DBL value; 
dropping the packet if said stored buffer count is greater 
15 than said DBL value; 

dropping the packet according to a probability function 
wherein the probability of said dropping is less than 
unity; and 

forwarding the packet to a process for further processing; 
20 and wherein said taking an action further comprises 
updating said flow table entry. 

52. The computer readable storage medium of claim 51 
wherein said action is based on a probability function. 

53. The computer readable storage medium of claim 43 
25 wherein said action comprises reading a credit field in said 

flow table entry and wherein said action is selected from the 
group consisting of: 

marking the packet based in part on said credit field if a 
3Q stored buffer count within said flow table entry is 
greater than said DBL value, wherein said marking 
further comprises enqueuing the packet and said 
enqueuing further comprises: 

calculating a buffer requirement for the incoming 
35 packet; 

incrementing said stored buffer count by said buffer 

requirement; and 
storing an incremented stored buffer count into a stored 
buffer count field; 
40 marking the packet based in part on said credit field and 
according to a probability function if a stored buffer 
count within said flow table entry is greater than said 
DBL value, wherein said marking further comprises 
enqueuing the packet; 
45 enqueuing the packet based in part on said credit field if 
said stored buffer count is less than said DBL value; 
dropping the packet based in part on said credit field if 
said stored buffer count is greater than said DBL value; 
dropping the packet based in part on said credit field and 
50 according to a probability function wherein the prob- 
ability of said dropping is less than unity; 
forwarding the packet to a process for further processing 

based in part on said credit field; 
and wherein said taking an action further comprises 
55 updating said flow table entry. 

54. The computer readable storage medium of claim 53 
wherein said updating comprises: 

decrementing said credit field and testing said credit field 
after said decrementing against a minimum threshold if 
60 said action is marking; or 
incrementing said credit field. 

55. The computer readable storage medium of claim 54 
wherein said DBL value is reduced prior to said testing. 

56. The computer readable storage medium of claim 53 
65 wherein said action is based on a probability function. 

* * * * * 
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